-
Notifications
You must be signed in to change notification settings - Fork 37
avoid deadlock on insufficient param bytes in wrpc_runtime_wasmtime
#1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
avoid deadlock on insufficient param bytes in wrpc_runtime_wasmtime
#1045
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I wasn't sure where to put this test, so I just put it in a standalone file at the root of tests so it's easy to move in case you want it in a specific place
| if diff_len == 0 { | ||
| trace!("consumed empty frame, closing receiver"); | ||
| self.as_mut().get_mut().rx.take(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the old code did
if buf.filled().is_empty() {
self.rx.take();
}but theoretically, it's possible for buf to not be empty when this function is called. I think this is a subtle bug, so I updated it to check the diff (i.e. if we have new data) instead
|
I also think that similarly, wRPC does not error on extra bytes. This makes some sense from a stream approach (just stop reading data from the stream when you no longer need it), but is a bit awkward if you want to proactively avoid errors by ensuring you process everything the user sent. Probably we get a fix for this for free if we had a way to indicate a way to denote the end of the stream though (basically, if we finish parsing and we haven't seen an "end of stream", we error) |
rvolosatovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution!
I acknowledge the problem you are trying to solve, however I don't think that the approach is valid in general case.
If what we are trying to solve here is preventing "zombie" connections then I think there is a fairly standard way to do it, which is introducing timeouts. For example, here's how wasi-http solves this problem: https://github.com/WebAssembly/WASI/blob/7e643518a4bf9767ca799d7aa776c560a2fc0351/proposals/http/wit/types.wit#L401-L428
If we follow the wasi-http approach, then we actually already have the connect timeout (https://docs.rs/wrpc-transport/latest/wrpc_transport/invoke/trait.InvokeExt.html#method.timeout), but what's lacking is:
- first byte timeout
- between byte timeout
(which really could just be a single timeout at least to start with)
Users would need to wrap the Incoming stream to add this functionality.
With a quick search I found https://docs.rs/tokio-io-timeout/latest/tokio_io_timeout/index.html, which seems to be doing exactly that.
How about we do this:
- Implement
AsyncReadforwrpc_transport::invoke::Timeout(whereT: AsyncRead) - Implement
AsyncWriteforwrpc_transport::invoke::Timeout(whereT: AsyncWrite) - Implement
wrpc_transport::Indexforwrpc_transport::invoke::Timeout(whereT: wrpc_transport::Index) - Wrap
IncomingandOutgoinginTimeout()wrpc/crates/transport/src/invoke.rs
Lines 91 to 92 in 39d34fe
type Outgoing = T::Outgoing; type Incoming = T::Incoming;
We could use the tokio-io-timeout crate for the tokio I/O trait implementations if you wish, personally I would probably implement it directly in wrpc-transport without introducing a dependency.
Once that's done, we could have a follow-up PR, which implements the timeout method on ServeExt and does roughly the same.
How does this sound? Would this address your concern? Is the implementation plan clear?
I'm happy to jump on a call to discuss if you wish. (you can reach me on BA zulip)
| // Track consecutive Pending results. If we get Pending multiple times after having | ||
| // attempted to read, it indicates we're waiting for data that will never arrive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can make this assumption and there are a few reasons for it:
- Various async executors are free to poll futures as many times as they want until they return
Ready. That can be done, in fact, as a valid optimization - polling a future once with a noop task context to short-circuit the case when it's ready, then constructing a "real" task context and polling again to register the task for wake up. - The underlying I/O stream is allowed to wake up spuriously. For example, see https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Receiver.html#method.poll_recv . Returning
Poll::Pending5 times and then returningPoll::Readywould be a perfectly valid behavior, which would be disallowed by this PR.
I don't think that counting the times that "pending" was returned is a valid approach in a general case.
| outgoing.flush().await?; | ||
| let mut buf = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| outgoing.flush().await?; | |
| let mut buf = vec![]; | |
| outgoing.flush().await?; | |
| drop(outgoing); | |
| let mut buf = vec![]; |
I don't think this usage is valid. By not sending the parameters in the initial request and keeping the stream open, the client signals that it will send parameters asynchronously. This is a fundamental part of the design, which allows us to implement e.g. future and stream types.
Closing the stream (by dropping it, for example) as expected, does not cause a deadlock on current main
|
Also, as a general note, please avoid committing Wasm files to this repository |
wRPC deadlocks/hangs indefinitely when a client calls
invokeon a function viawrpc-runtime-wasmtime(ex:serve_function_shared) with insufficient encoded data for the expectedparamstype. This can occur when:Why this happens
in the
callfunction inwrpc-runtime-wasmtimeparamsas bytes over a stream (rx). These are untyped raw bytesparams_tyYou can see the reading from
rxin a loop according toparams_tyinsidecallhereHowever, if the contents of the stream doesn't match the expected params (ex: fewer params than expected, or there aren't enough bytes to finish the loop because maybe you passed the wrong type) this for loop will never terminate.
Why did this never get caught?
Other than it being an edge case, this bug only appears in the
wrpc-runtime-wasmtimecrate, while most tests useserve_values/invoke_values_blockingwhich do not go through this code path.Why this matters
This is a denial-of-service (DoS) vulnerability. Malicious actors can exploit this to exhaust server resources (connections, memory, file descriptors) by sending malformed requests that hang indefinitely, making the server unavailable to legitimate clients.
How should it be fixed?
Internally, read_value checks which type it should use, then tries to read it (ex: read_u8().await). This then calls poll_read on the
Incomingstream. If it's ever unable to parse the value because there aren't enough bytes left,poll_readdeadlocks in thePendingstate inside the ready! macro.Example trace
Some solutions that don't work:
read_u8(or similar functions) to fail, as there is no way forread_valueto know it's reached the end of the data, and certain types can have multiple encodings so it's hard for it to magically know when it's received enough data for certain data types.(wrong place to try to fix this)Possible solution A: modify
ingress/egressbehaviorGoal: somehow indicate we've reached the end of the data
calldoes not actually receive data from the network directly. Rather, the data is first read in by aningressin the server that processes the data first.Notably, the connection is established as follows:
a.
serve_function_sharedserves a connection viaConnb.
invokealso connects viaConnThey then communicate via an egress in the client to an ingress in the server.
It's possible there is a way to make the
ingresson the server side more aware of what is going on in the communication protocol so it can properly indicate the EOF (and maybe other protections like clients sending payloads beyond a size limit, timeouts, or whatever other solutions people prefer to protect their server) as currently it's just an infinite loop until it receives enough data (which in turn meanscallalso stays blocked waiting for more data from the ingress, which it never receives)Possible solution B (used by this PR): modify
poll_readInstead of trying to redesign the
ingress(which probably is too big of a decision for me to make), I instead implemented a heuristic inpoll_readto try to avoid deadlocksCore idea:
*this.has_read_databranch)Poll::Pendingno matter how many times we try and get data. We assume it's because no data is ever coming, so we error (*this.pending_count > 1branch)Note: this is not perfect, because on slow connections this could cause an error even though the data really was coming. I think this is not too likely, and I'd prefer to accidentally error on slow connections than deadlock on improperly formatted calls.